skip to main content
US FlagAn official website of the United States government
dot gov icon
Official websites use .gov
A .gov website belongs to an official government organization in the United States.
https lock icon
Secure .gov websites use HTTPS
A lock ( lock ) or https:// means you've safely connected to the .gov website. Share sensitive information only on official, secure websites.


Search for: All records

Creators/Authors contains: "Zhu, He"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. Over the past decade, deep reinforcement learning (RL) techniques have significantly advanced robotic systems. However, due to the complex architectures of neural network models, ensuring their trustworthiness is a considerable challenge. Programmatic reinforcement learning has surfaced as a promising approach. Nonetheless, synthesizing robot-control programs remains challenging. Existing methods rely on domain-specific languages (DSLs) populated with user-defined state abstraction predicates and a library of low-level controllers as abstract actions to boot synthesis, which is impractical in unknown environments that lack such predefined components. To address this limitation, we introduce RoboScribe, a novel abstraction refinement-guided program synthesis framework that automatically derives robot state and action abstractions from raw, unsegmented task demonstrations in high-dimensional, continuous spaces. It iteratively enriches and refines an initially coarse abstraction until it generates a task-solving program over the abstracted robot environment. RoboScribe is effective in synthesizing iterative programs by inferring recurring subroutines directly from the robot’s raw, continuous state and action spaces, without needing predefined abstractions. Experimental results show that RoboScribe programs inductively generalize to long-horizon robot tasks involving arbitrary numbers of objects, outperforming baseline methods in terms of both interpretability and efficiency. 
    more » « less
    Free, publicly-accessible full text available October 1, 2026
  2. We propose a deductive synthesis framework for construct- ing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements. 
    more » « less
    Free, publicly-accessible full text available July 21, 2026
  3. Abstract We propose a deductive synthesis framework for constructing reinforcement learning (RL) agents that provably satisfy temporal reach-avoid specifications over infinite horizons. Our approach decomposes these temporal specifications into a sequence of finite-horizon subtasks, for which we synthesize individual RL policies. Using formal verification techniques, we ensure that the composition of a finite number of subtask policies guarantees satisfaction of the overall specification over infinite horizons. Experimental results on a suite of benchmarks show that our synthesized agents outperform standard RL methods in both task performance and compliance with safety and temporal requirements. 
    more » « less
    Free, publicly-accessible full text available July 23, 2026
  4. We introduce VELM, a reinforcement learning (RL) framework grounded in verification principles for safe exploration in unknown environments. VELM ensures that an RL agent systematically explores its environment, adhering to safety properties throughout the learning process. VELM learns environment models as symbolic formulas and conducts formal reachability analysis over the learned models for safety verification. An online shielding layer is then constructed to confine the RL agent’s exploration solely within a state space verified as safe in the learned model, thereby bolstering the overall safety profile of the RL system. Our experimental results demonstrate the efficacy of VELM across diverse RL environments, highlighting its capacity to significantly reduce safety violations in comparison to existing safe learning techniques, all without compromising the RL agent’s reward performance. 
    more » « less
  5. Deep reinforcement learning (RL) has led to encouraging successes in numerous challenging robotics applications. However, the lack of inductive biases to support logic deduction and generalization in the representation of a deep RL model causes it less effective in exploring complex long-horizon robot-control tasks with sparse reward signals. Existing program synthesis algorithms for RL problems inherit the same limitation, as they either adapt conventional RL algorithms to guide program search or synthesize robot-control programs to imitate an RL model. We propose ReGuS, a reward-guided synthesis paradigm, to unlock the potential of program synthesis to overcome the exploration challenges. We develop a novel hierarchical synthesis algorithm with decomposed search space for loops, on-demand synthesis of conditional statements, and curriculum synthesis for procedure calls, to effectively compress the exploration space for long-horizon, multi-stage, and procedural robot-control tasks that are difficult to address by conventional RL techniques. Experiment results demonstrate that ReGuS significantly outperforms state-of-the-art RL algorithms and standard program synthesis baselines on challenging robot tasks including autonomous driving, locomotion control, and object manipulation. CCS Concepts: •Software and its engineering → Automatic programming. 
    more » « less
  6. Goal-conditioned reinforcement learning (RL) is a powerful approach for learning general-purpose skills by reaching diverse goals. However, it has limitations when it comes to task-conditioned policies, where goals are specified by temporally extended instructions written in the Linear Temporal Logic (LTL) formal language. Existing approaches for finding LTL-satisfying policies rely on sampling a large set of LTL instructions during training to adapt to unseen tasks at inference time. However, these approaches do not guarantee generalization to out-of-distribution LTL objectives, which may have increased complexity. In this paper, we propose a novel approach to address this challenge. We show that simple goal-conditioned RL agents can be instructed to follow arbitrary LTL specifications without additional training over the LTL task space. Unlike existing approaches that focus on LTL specifications expressible as regular expressions, our technique is unrestricted and generalizes to ω-regular expressions. Experiment results demonstrate the effectiveness of our approach in adapting goal-conditioned RL agents to satisfy complex temporal logic task specifications zero-shot. 
    more » « less
  7. Abstract Microbes are the drivers of soil phosphorus (P) cycling in terrestrial ecosystems; however, the role of soil microbes in mediating P cycling in P‐rich soils during primary succession remains uncertain. This study examined the impacts of bacterial community structure (diversity and composition) and its functional potential (absolute abundances of P‐cycling functional genes) on soil P cycling along a 130‐year glacial chronosequence on the eastern Tibetan Plateau. Bacterial community structure was a better predictor of soil P fractions than P‐cycling genes along the chronosequence. After glacier retreat, the solubilization of inorganic P and the mineralization of organic P were significantly enhanced by increased bacterial diversity, changed interspecific interactions, and abundant species involved in soil P mineralization, thereby increasing P availability. Although 84% of P‐cycling genes were associated with organic P mineralization, these genes were more closely associated with soil organic carbon than with organic P. Bacterial carbon demand probably determined soil P turnover, indicating the dominant role of organic matter decomposition processes in P‐rich alpine soils. Moreover, the significant decrease in the complexity of the bacterial co‐occurrence network and the taxa‐gene‐P network at the later stage indicates a declining dominance of the bacterial community in driving soil P cycling with succession. Our results reveal that bacteria with a complex community structure have a prominent potential for biogeochemical P cycling in P‐rich soils during the early stages of primary succession. 
    more » « less
  8. Abstract BackgroundTimber harvesting and industrial wood processing laterally transfer the carbon stored in forest sectors to wood products creating a wood products carbon pool. The carbon stored in wood products is allocated to end-use wood products (e.g., paper, furniture), landfill, and charcoal. Wood products can store substantial amounts of carbon and contribute to the mitigation of greenhouse effects. Therefore, accurate accounts for the size of wood products carbon pools for different regions are essential to estimating the land-atmosphere carbon exchange by using the bottom-up approach of carbon stock change. ResultsTo quantify the carbon stored in wood products, we developed a state-of-the-art estimator (Wood Products Carbon Storage Estimator, WPsCS Estimator) that includes the wood products disposal, recycling, and waste wood decomposition processes. The wood products carbon pool in this estimator has three subpools: (1) end-use wood products, (2) landfill, and (3) charcoal carbon. In addition, it has a user-friendly interface, which can be used to easily parameterize and calibrate an estimation. To evaluate its performance, we applied this estimator to account for the carbon stored in wood products made from the timber harvested in Maine, USA, and the carbon storage of wood products consumed in the United States. ConclusionThe WPsCS Estimator can efficiently and easily quantify the carbon stored in harvested wood products for a given region over a specific period, which was demonstrated with two illustrative examples. In addition, WPsCS Estimator has a user-friendly interface, and all parameters can be easily modified. 
    more » « less
  9. We present a verification-based learning framework VEL that synthesizes safe programmatic controllers for environments with continuous state and action spaces. The key idea is the integration of program reasoning techniques into controller training loops. VEL performs abstraction-based program verification to reason about a programmatic controller and its environment as a closed-loop system. Based on a novel verification-guided synthesis loop for training, VEL minimizes the amount of safety violation in the proof space of the system, which approximates the worst-case safety loss, using gradient-descent style optimization. Experimental results demonstrate the substantial benefits of leveraging verification feedback for synthesizing provably correct controllers. 
    more » « less